We continuously review and integrate the old iMAP[1] with Snakemake and GitHub actions to facilitate reproducible microbiome data analysis!




General Overview


We envision to keep fostering on continuous integration and development of highly reproducible workflows.


Snakemake rule-graph

  • Typically the snakemake workflow is defined by specifying rules.
  • The rule-graph graphically shows the input-output files.
  • The snakemake is capable of automatically determining the dependencies between the rules and creates a dot-like DAG (Directed Acyclic Graph).



IMAP-PART3: Snakemake workflow



Screenshot of interactive snakemake report

The snakemake html report can be viewed using any compartible browser, such as chrome to explore more on the workflow and the associated statistics. You will be able to close the left bar to get a better view of the dispaly.




Preliminary OTU Analysis using Mothur

First initiate mothur then run the following commands on mothur cli to generate downstream input data.

Appendix

Mothur reference databases

  1. Mothur-based SILVA reference files[5]
  2. .
  3. Mothur-based RDP reference files[6]. Note: The RDP database is to classify 16S rRNA gene sequences to the genus level.
  4. ZymoBIOMICS Microbial Community Standard (Cat # D6306)[7]. The ZymoBIOMICS Microbial Community DNA Standard is designed to assess bias, errors and other artifacts after the step of nucleic acid purification.



Troubleshooting (in progress)

  1. Are chimeras removed by default in newer versions?
    • Yes. Chimeras are removed by default. You can still run the remove.seqs command without error, but it is not necessary. Remove chimera sequence explained here
    .
  2. Mothur dist.seqs taking too long.
    • Merged reads are too long, probably over 300pb.
    • Reads not overlaping when merging the paired reads.
    • Too many uniques representative sequences probably caused by lack of overlapping.
    • No enough computer power which suggest a use of HPC or Cluster.




References

[1]
Buza, T. M., Tonui, T., Stomeo, F., Tiambo, C., Katani, R., Schilling, M., … Kapur, V. (2019). iMAP: An integrated bioinformatics and visualization pipeline for microbiome data analysis. BMC Bioinformatics, 20. https://doi.org/10.1186/S12859-019-2965-4
[2]
Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., … Nahnsen, S. (2021). Sustainable data analysis with snakemake. F1000Research, 10. https://doi.org/10.12688/f1000research.29032.2
[3]
Snakemake. (2023). Snakemake. Retrieved from https://snakemake.readthedocs.io/en/stable
[4]
Close, W. L. (2020). Mothur 16S v4 analysis pipeline. Retrieved from https://github.com/wclose/mothurPipeline
[5]
Mothur-based silva reference files. Retrieved from https://mothur.org/wiki/silva_reference_files/
[6]
Mothur-based RDP reference files. Retrieved from https://mothur.org/wiki/rdp_reference_files/
[7]
ZymoBIOMICS microbial community DNA standard (cat # D6306). Retrieved from https://www.zymoresearch.com/zymobiomics-community-standard